Estimating query rewriting quality over LOD

نویسندگان

  • Ana I. Torre-Bastida
  • Jesús Bermúdez
  • Arantza Illarramendi
چکیده

Nowadays it is becoming increasingly necessary to query data stored in different datasets of public access, such as those included in the Linked Data environment, in order to get as much information as possible on distinct topics. However, users have difficulty to query those datasets with different vocabularies and data structures. For this reason it is interesting to develop systems that can produce on demand rewritings of queries. Moreover, a semantics preserving rewriting cannot often be guaranteed by those systems due to heterogeneity of the vocabularies. It is at this point where the quality estimation of the produced rewriting becomes crucial. In this paper we present a novel framework that, given a query written in the vocabulary the user is more familiar with, the system rewrites the query in terms of the vocabulary of a target dataset. Moreover, it also informs about the quality of the rewritten query with two scores: firstly, a similarity factor which is based on the rewriting process itself, and secondly, a quality score offered by a predictive model. This model is constructed by a machine learning algorithm that learns from a set of queries and their intended (gold standard) rewritings. The feasibility of the framework has been validated in a real scenario.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality-driven Integration of Heterogenous Information Systems

Integrated access to information that is spread over multiple, distributed, and heterogeneous sources is an important problem in many scienti c and commercial domains. While much work has been done on query processing and choosing plans under cost criteria, very little is known about the important problem of incorporating the information quality aspect into query planning. In this paper we desc...

متن کامل

Data Warehouse Evolution: Trade-Offs between Quality and Cost of Query Rewritings

Query rewriting has been used as a query optimization technique for several decades to reduce the computational cost of a query. It has generally been assumed that any rewritten query will generate the identical query result as the original query, in terms of both the query interface and the query extent. Hence, this is called “equivalent query rewriting”. Recently, query rewriting with relaxed...

متن کامل

ELITE: An Entailment-Based Federated Query Engine for Complete and Transparent Semantic Data Integration

In recent years the core of the semantic web has evolved into a conceptual layer built by a set of ontologies mapped onto data distributed in numerous data sources, interlinked, interpreted and processed in terms of semantics. One of the central issues in this context became the federated querying of such linked data. This paper presents the federated query engine ELITE that facilitates a compl...

متن کامل

Design and Evaluation of an IR-Benchmark for SPARQL Fulltext Queries

In this thesis, we design a new IR-benchmark that aims to bridge the prevailing gap between traditional keyword-based retrieval techniques and semantic web-based retrieval techniques. We present a unique, entity-centric data collection, coined Wikipedia-LOD, that aims to combine the benefits of both text-oriented and structured retrieval settings. This collection combines RDF data from DBpedia ...

متن کامل

Data Warehouse Evolution : Trade - o s between Quality and Cost of Query

The problem of rewriting queries has been heavily explored in recent years, including in work on query processing and optimization, semantic query re nement in decentralized environments, the rewriting of queries using views, and view maintenance. Previous work has made the restricting assumption that the rewritten query must be equivalent to the initially given query. We now propose to relax t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018